Binary Neural Networks Algorithms, Architectures, and Applications (Baochang Zhang, Sheng Xu, Mingbao Lin etc.)

152

Applications in Computer Vision

projecting the real-valued (32-bit) variable x onto a set as

Q = {a1, a2, · · · , an} ,

(6.1)

where Q is a discrete set and n is the bit size of the set Q. For example, n is set as 2¹⁶when

performing 16-bit quantization. Then, we deﬁne the projection of x ∈R onto the set Q as

PR→Q(x) =

⎧

⎪

⎨

⎪

⎩

a1,

x < ^a¹⁺^a²

· · ·

ai,

ai−1+ai

≤x < ^aⁱ⁺^aⁱ⁺¹

· · ·

an,

an−1+an

≤x

(6.2)

By projecting 32-bit wights and activations into low bit cases, the computation source

will be reduced to a great deal. For extreme cases, binarizing weights and activations of

neural networks decreases the storage and computation cost by 32× and 64×, respectively.

Considering the binarization process of BNNs, Eqs. 6.34 and 6.79 are relaxed into

PR→B(x) =

−1,

x < 0

+1,

0 ≤x ^{, s.t.}^B⁼^{−¹^,⁺¹^}^,

(6.3)

where we set a1 =−1 and a2 =+1. Then PR→B(·) is equivalent to the sign function i.e.,

sign(·).

The learning objective of conventional BNNs (XNOR-Net) is deﬁned to minimize the

geometry distance between x and PR→B(x) as

arg min

x,α

∥x −αPR→B(x)∥²

2^,

(6.4)

where α is an auxiliary scale factor. In recent works of binarized neural networks (BNNs)

[199, 159], they explicitly solve the objective as

α =

∥x∥1

size(x)^,

(6.5)

where size(x) denotes the number of elements in x. However, this objective is insuﬃcient to

maintain the information of the real-valued counterpart x. To overcome this shortcoming,

we introduce the kernel reﬁning convolution.

Furthermore, XNOR-Net, which aligns with most BNNs, leads to intrachannel feature

homogenization, thus causing degradation of feature representation capacity. Hence, a new

feature reﬁnement method should be introduced.

6.2.2

Kernel Reﬁning Generative Adversarial Learning (KR-GAL)

Given a conventional CNN model, we denote wi ∈Rni and ai ∈Rmi as its weights and

feature maps in the i-th layer, where ni = Ci · Ci−1 · Ki · Ki and mi = Ci · Wi · Hi. Ci

represents the number of output channels of the i-th layer. (Wi, Hi) are the width and

height of the feature maps and Ki is the kernel size. Then we have the following.

ai = ai−1 ⊗wi,

(6.6)

where ⊗is the convolutional operation. As mentioned above, the BNN model aims to

binarize wi and ai into PR→B(wi) and PR→B(ai). For simpliﬁcation, in this chapter, we

denote PR→B(wi) and PR→B(ai) as b^wⁱ∈Bmi and b^aⁱ∈Bni in this chapter, respectively.